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IMPROVING SPATIAL QUALITY OF CODED PICTURES USING LAYERED SCALABLE VIDEO . 
BIT STREAMS 

Field of the Invention 

5 This invention relates to video signals, and in particular to layered scalable video 
bit streams. 

Background of the Invention 

10 A 'video signal' consists of a sequence of images. Each image is referred to as a 
'frame'. When a video signal is transmitted from one location to another, it is 
typically transmitted as a sequence of pictures. Each frame may be sent as a 
single picture, however the system may need to send more than one picture to 
transmit all the information in one frame. 

15 

Increasingly, video signals are being transmitted over radio communication links. 
This transmission may be over a communication path of very limited bandwidth, 
for example over a communication channel between a portable or mobile radio 
device and a base station of a cellular communications system. 

20 

One method of reducing the bandwidth required for transmission of video is to 
perform particular processing of the video signal prior to transmission. However, 
the quality of a video signal can be affected during coding or compression of the 
video signal. For this reason, methods have been developed to enhance the 
25 quality of the received signal following decoding and/or decompression. 

It is known, for example, to include additional 'layers' of transmission, beyond 
simply the base layer in which pictures are transmitted. The additional layers are 
termed 'enhancement layers'. The basic video signal is transmitted in the base 
30 layer. The enhancement layers contain sequences of pictures that are transmitted 
in addition to the basic set of pictures. These additional pictures are then used by 
a receiver to improve the quality of the video. The pictures transmitted in the 
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enhancement layers may be based on the difference between the actual video 
signal and the video bit stream after it has been encoded by the transmitter. 

The base layer of video transmission typically contains two types of picture. The 
5 first is an 'Intracoded' picture, which is often termed an l-picture. The important 
feature of an l-picture. is that it contains all the information required for a receiver to 
display the current frame of the video sequence. When it receives an l-picture, the 
receiver can display the frame without using any data about the video sequence 
that it has received previously. 

10 

A P-picture contains data about the differences between one frame of the video 

s equence and a p r e vious-fra me. Thus a P ^picture constitutes an ' updateVWhenit- 

receives a P-picture, a receiver displays a frame that is based on both the P- 
picture and data that it already holds about the video stream from previously 
15 received pictures. 

If a video system employs one or more enhancement layers, then it can send a 
variety of different types of picture in the enhancement layer. One of these types is 
a 'B-picture'. A 'B-picture' differs from both I- and P-pictures. A 'B-picture' is 
20 predicted based on information from both a picture that precedes the B-picture in 
time in the video stream and one that follows it. The B-picture is said to be 'bi- 
directionally predicted': This is illustrated in figure 1 of the appended drawings. 

A B-picture is predicted based on pictures from the layer below it. Thus a system 
25 with a base layer and a single enhancement layer will predict 'B-pictures' based on 
earlier and later pictures in the base layer, and transmit these B-pictures in the 
enhancement layer. A notable feature of B-pictures are that they are disposable- 
the receiver does not have to have them in order to display the video sequence. In 
this sense they differ from P-pictures, which are also predicted, but are necessary 
30 for the receiver to reconstruct the video sequence. A further difference lies in the 
fact that B-pictures cannot serve as the basis for predicting further pictures. 



WO 02/05563 



PCT/EPO 1/07885 



3 

The pictures transmitted in the enhancement layers are an optional enhancement, 
since the transmission scheme always allows a receiver to re-construct the 
transmitted video stream using only the pictures contained in the base layer. 
However, any systems that have sufficient transmission bandwidth can be 
5 arranged to use these enhancement layers. Typically, the base layer requires a 
relatively low transmission bandwidth, and the enhancement layers require a 
greater bandwidth. An example of typical transmission bandwidths is given in 
connection with the discussion of the invention as illustrated in figures 8 and 9. 

10 This hierarchy of base-layer pictures and enhancement pictures, partitioned into 
one or more layers, is referred to as a layered scalable video bit stream. 



In a layered scalable video bit stream, enhancements can be added to the base 
layer by one or more of three techniques. These are: 
15 (i) Spatial scalability. This involves increasing the resolution of the picture. 

(ii) SNR scalability. This involves including error information to improve the Signal 
to Noise Ratio of the picture. 

(iii) Temporal scalability. This involves including extra pictures to increase the 
frame rate. 

20 

The term 'hybrid scalability' implies using more than one of the techniques above 
in encoding of the video stream. 

Enhancements can be made to the whole picture. Alternatively, the enhancements 
25 can be made to an arbitrarily shaped object within the picture, which is termed 
'object-based' scalability. 

The temporal enhancement layer is disposable, since a receiver can still re- 
construct the video stream without the pictures in the enhancement layer. In order 
30 to preserve the disposable nature of the temporal enhancement layer, the H.263+ 
standard dictates that pictures included in the temporal scalability mode must be 
bi-directionally predicted (B) pictures. This means that they are predicted based on 
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both the image that immediately precedes them in time and on the image which 
immediately follows them. 

If a three layer video bit stream is used, the base layer (layer 1) will include intra- 
5 coded pictures (I pictures). These l-pictures are sampled, coded or compressed 
from the original video signal pictures. Layer 1 will also include a plurality of 
predicted inter-coded pictures (P pictures). In the enhancement layers (layers 2 
or 3 or more), three types of picture may be used for scalability: bi-directionally 
predicted (B) pictures; enhanced intra (El) pictures; and enhanced predicted (EP) 
10 pictures. El pictures may contain SNR enhancements to pictures in the base layer, 
but may instead be a spatial scalability enhancement. 



The three basic methods of scalability will now be explained in more detail. 

15 Temporal Scalability 

Temporal scalability is achieved using bi-directionally predicted pictures, or B- 
pictures. These B-pictures are predicted from previous and subsequent 
reconstructed pictures in the reference layer. This property generally results in 
20 improved compression efficiency as compared to that of P pictures. 

B pictures are not used as reference pictures for the prediction of any other 
pictures. This property allows for B-pictures to be discarded if necessary without 
adversely affecting any subsequent pictures, thus providing temporal scalability. 

25 

Figure 1 shows a sequence of pictures plotted against time on the x-axis. Figure 1 
illustrates the predictive structure of P and B pictures. 

SNR Scalability 

30 

The other basic method to achieve scalability is through spatial/SNR 
enhancement. Spatial scalability and SNR scalability are equivalent, except for 
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the use of interpolation as is described shortly. Because compression introduces 
artifacts and distortions, the difference between a reconstructed picture and its 
original in the encoder is nearly always a nonzero-valued picture, containing what 
can be called the coding error. Normally, this coding error is lost at the encoder 
5 and never recovered. With SNR scalability, these coding error pictures can also 
be encoded and sent to the decoder. This is shown in Figure 2. These coding error 
pictures produce an enhancement to the decoded picture. The extra data serves 
to increase the signal-to-noise ratio of the video picture, hence the term SNR 
scalability. 

10 

Figure 3 illustrates the data flow for SNR scalability. The vertical arrows from the 
lower layer illustrate that the^picture InThlf enhancement layerls pr^irtecTfrom a ~ 
reconstructed approximation of that picture in the reference (lower) layer. 

15 Figure 2 shows a schematic representation of an apparatus for conducting SNR 
scalability. In the figure, a video picture F 0 is compressed, at 1 , to produce the 
base layer bit stream signal to be transmitted at a rate n kbps. This signal is 
decompressed, at 2, to produce the reconstructed base layer picture Fo'. 

20 The compressed base layer bit stream is also decompressed, at 3, in the 

transmitter. This decompressed bit stream is compared with the original picture Fo, 
at 4, to produce a difference signal 5. This difference signal is compressed, at 6, 
and transmitted as the enhancement layer bit stream at a rate r2 kbps. This 
enhancement layer bit stream is decompressed at 7 to produce the enhancement 

25 layer picture F 0 ". This is added to the reconstructed base layer picture Fo' at 8 to 
produce the final reconstructed picture F 0 '. 

If prediction is only formed from the lower layer, then the enhancement layer 
picture is referred to as an El picture. An El picture may provide an SNR 
30 enhancement on the base layer, or may provide a spatial scalability enhancement. 
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It is possible, however, to create a modified bi-directionally predicted picture using 
both a prior enhancement layer picture and a temporally simultaneous lower layer 
reference picture. This type of picture is referred to as an EP picture or 
"Enhancement" P-picture. 

5 

The prediction flow for El and EP pictures is shown in Figure 3. Although not 
specifically shown in Figure 3, an El picture in an enhancement layer may have a 
P picture as its lower layer reference picture, and an EP picture may have an I 
picture as its lower-layer enhancement picture. 

10 

For both El and EP pictures, the prediction from the reference layer uses no 
motion vectors. Moweverras with noTmalP picturesT EP^ictuTes~uWrnotion 
vectors when predicting from their temporally-prior reference picture in the same 
layer. 

15 

Spatial Scalability 

The third and final scalability method is spatial scalability, which is closely related 
to SNR scalability. The only difference is that before the picture in the reference 
20 layer is used to predict the picture in the spatial enhancement layer, it is 
interpolated by a factor of two. This interpolation may be either horizontally or 
vertically (1-D spatial scalability), or both horizontally and vertically (2-D spatial 
scalability). Spatial scalability is shown in Figure 4. 

25 The three basic scalability modes, temporal, SNR and spatial scalability, can be 
applied to any arbitrarily shaped object within the picture, including the case where 
the object is rectangular and covers the whole frame. This is known as object 
based scalability. 



30 



SNR scalability is more efficient at lower bit rates, and temporal scalability more 
efficient when there is a higher bandwidth available. To take advantage of this fact, 
a hybrid scalability model has been developed. This is described in "H.263 
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Scalable Video Coding and Transmission at Very Low Bitrates", PhD Dissertation, 
Faisal Ishtiaq, Northwestern University, Illinois, USA, December 1999. This model 
consists of a base layer (layer 1), followed by an SNR enhancement layer (layer 
2), then a further enhancement layer (layer 3). In layer 3, a dynamic choice is 
5 made between SNR or temporal mode. This choice between SNR enhancement 
and temporal enhancement is made based on four factors: the motion in the 
current picture, the separation between pictures, the peak signal to-noise-ratio 
(PSNR) gain from layer 2 to layer 3 if SNR scalability were to be chosen, and the 
bit rate available for layer 3. 

10 

Figure 5 shows an example of a three layer video bit stream using hybrid 
SNRTtemporal scalability along the lines de^bliaTrnhe pnor anYc!ocumerir~ 
mentioned above. 

15 When SNR scalability mode is selected in layer 3, there is a spatial quality 
improvement over the layer 2 picture at the same temporal position. If temporal 
scalability is selected for the following picture, the extra information from the old El 
picture in layer 3 is not used. This means that if layer 3 has a much greater bit 
rate allocation than layer 2, the layer 3 El picture may contain significant 

20 additional information, which is wasted. 

Furthermore, if a B picture is encoded in layer 3, it is bi-directionally predicted from 
the previous and subsequent layer 2 picture (El pictures), and therefore is of a 
lower spatial quality than neighbouring pictures. These neighbouring pictures may 

25 have been chosen to include SNR enhancement information instead. This is 
particularly noticeable when the base and enhancement layer 2 have low bit rates 
allocated to them, and enhancement layer 3 has a much greater bit rate allocation. 
Hence, not only is a low spatial quality B picture undesirable for the viewer, but a 
continual variation in video spatial quality between pictures is also particularly 

30 noticeable. However, since the human visual system considers motion to be 
relatively more significant than the spatial quality of an individual picture, it is still 
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important to include B pictures, especially when a video is to be viewed in slow 
motion. 

A problem solved by the invention is how to encode B pictures so that they are not 
5 of noticeably worse spatial quality than the enhancement intra (El) pictures 
provided by SNR scalability mode in enhancement layer 3, without exceeding the 
target bit rate for any of the layers. 

A prior art arrangement is known from published European Patent Application 
10 EP-A-0739140. EP-A-0739140 shows an encoder for an end-to-end scalable 
video delivery system. The system employs base and enhancement layers. 



A further prior art arrangement is known from published International Patent 
Application number WO-A-9933274. WO-A-9933274 shows a scalable predictive 
15 coder for video. This system also employs base and enhancement layers. 

Summary of the Invention 

The present invention provides a method of optimising the spatial quality of a 
20 picture produced by temporal scalability for an enhancement layer of a video bit 
stream, wherein the picture is predicted based on a picture appearing in the 
highest enhancement layer of the bit stream. In this way, if extra information is 
already known in the highest enhancement layer, it is not wasted. 

25 Preferably the picture is predicted based only on one picture appearing in the 
highest enhancement layer of the bit stream. In theory, however, the prediction 
could take place based on additional information contained elsewhere in the bit 
stream. 

30 The present invention further provides a method of optimising the spatial quality of 
a picture produced by temporal scalability for an enhancement layer of a video bit 
stream, wherein the picture is predicted based on a single picture already 
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appearing in the same enhancement layer of the bit stream. This is quite different 
to the prior art, wherein pictures produced by temporal scalability are predicted 
based on information contained in two pictures, the previous and subsequent 
pictures in the lower enhancement layers. 

5 

The prediction of the picture by temporal scalability is preferably achieved using 
forward prediction from a previous El picture in the same enhancement layer. 

If an appropriate picture is not available in the same enhancement layer for a 
10 forward prediction to be made, the method of the present invention may result in a 
bi-directional prediction being carried out using previous and subsequent lower 
layer pictures. 

The present invention is particularly applicable to a three layer system, with the 
15 picture produced by temporal scalability according to the present invention 
appearing in the third layer, namely the second enhancement layer. 

A method according to the present invention may be used when a video bit stream 
is prepared for transmission, perhaps via a wireless or mobile communications 
20 system, using a hybrid SNR/temporal scalability method. Spatial and/or object 
based scalability may, however, also be involved, either with or without SNR 
scalability, as appropriate, and the scalability can be applied to arbitrarily shaped 
objects as well as to rectangular objects. 

25 The present invention also provides a system which is adapted to implement the 
method according to the present invention described and claimed herein. 

Brief Description of the Drawings 



30 



Figure 1 is a schematic illustration of B picture prediction dependencies; 
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Figure 2 is a schematic representation of an apparatus and method for 
undertaking SNR scalability; 

Figure 3 is a schematic illustration showing a base layer and an enhancement 
5 layer produced using SNR scalability; 

Figure 4 is a schematic illustration showing a base layer and an enhancement 
layer produced using spatial scalability; 

10 Figure 5 is a schematic illustration of a three layer hybrid SNR/temporal scalability 
application according to the prior art; 



Figure 6 is a schematic illustration of a three layer hybrid SNR/temporal scalability 
application according to the present invention wherein a picture in the highest 
15 possible enhancement layer is used for B picture prediction; 

Figure 7 is a flow diagram depicting the essence of an algorithm according to the 
present invention; 

20 Figure 8 is a graph of PSNR for each encoded picture of a QCIF "Foreman" 

sequence with B picture prediction from El pictures in layer 2 according to the prior 
art method; 

Figure 9 is a graph of PSNR for each encoded picture of a QCIF "Foreman" 
25 sequence with B picture prediction from El pictures in layer 3 according to the 
present invention; 

Figure 10 illustrates the general scheme of a wireless communications system 
which could take advantage of the present invention; and 

30 

Figure 1 1 illustrates a mobile station (MS) which uses the method according to the 
present invention. 
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Description of a Preferred Embodiment 

The present invention is now described, by way of example only, with reference to 
Figures 6 to 1 1 of the accompanying drawings. 

5 

Figure 6 shows a three layer video bit stream, wherein layer 1 is a base layer and 
layers 2 and 3 are enhancement layers. 

The first enhancement layer, layer 2, is produced using SNR enhancement based 
10 on the pictures appearing in layer 1 . The layer 3 enhancement is achieved based 
on a hybrid SNR/temporal scalability method. The choice between SNR scalability 
and te7npl)fa1 

PhD Dissertation of Faisal Ishtiaq discussed above. 

15 As will be seen in Figure 6, two B pictures are shown. The first, B 0 . 5 , results due to 
the algorithm of the method of the present invention forcing the use of a forward 
prediction mode based on the preceding layer 3 El picture (Elo). The preceding 
layer 3 El picture (El 0 ) was produced by SNR enhancement of the corresponding 
layer 2 picture. The idea of forcing a forward prediction to produce a B picture is, 

20 as far as the inventors are aware, completely novel. 

Furthermore, the production of picture B 0 .5 would appear to be somewhat 
contradictory to prior art approaches in this environment, because a B picture is by 
normal definition "bi-directionally predicted" based on two pictures. This does not 
25 occur in this embodiment of the present invention. 

With regard to the second B picture appearing in Figure 6, Bi. 5) this is produced 
based on a bi-directional prediction using the previous and subsequent layer 2 El 
pictures (Eh and El 2 ). This is because layer 3 does not include an enhanced 
30 version of the layer 2 picture Eh, and a forward prediction cannot therefore be 
made. The layer two picture Eh is simply repeated in layer 3, without any 
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enhancement. Likewise as shown in Figure 6, the layer 2 picture EI2 is simply 
repeated without any enhancement in layer 3. 

As will be appreciated, a layer 3 forward prediction can only occur according to the 
5 present invention if layer 3 includes a picture which has been enhanced over its 
corresponding picture in layer 2. Hence, the algorithm of Figure 7, which supports 
the present invention, forces a decision as to whether a B picture is to be predicted 
from a previous picture in the same layer, or is determined based on a bi- 
directional prediction using pictures from a lower layer. 

10 

As will be appreciated, the present invention optimises the quality of the B picture 
by using th^ictai^s^frorrrth — 
layer 3 El picture is available, the B picture is predicted from it, using forward 
prediction mode only. This is because no subsequent layer 3 El picture is available 
15 for allowing bi-directional prediction to be used. If no previous layer 3 El picture is 
available, then the previous and subsequent layer 2 El pictures are used to bi- 
directionally predict the picture. 

As shown by the graphs forming Figures 8 and 9, the present invention improves 
20 the quality (PSNR) of the B pictures by up to 1 .5 dB in the cases where it is 
possible to predict from a previous layer 3 El picture. The points in the graph of 
Figure 9 that have been circled in dashing relate to the B pictures that have been 
forward predicted in accordance with the invention. These can be compared to the 
circled points in Figure 8. This improvement is most noticeable at low bit rates. 
25 This is when the temporal scalability mode is not selected for every picture, and 
forward prediction from the layer 3 El picture can occur more often since there are 
more Layer 3 El pictures encoded. 

It should also be appreciated that the improved spatial quality provided by the 
30 present invention is achieved without additional coder/decoder complexity. 

Furthermore, the invention is applicable to any layered scalable video transmission 
system, including those defined by the MPEG4 and H.263+ standards. 
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With reference to Figures 8 and 9, the invention was tested with a base layer at 
13kbps, first enhancement layer (layer 2) at 52kbps and a second enhancement 
layer (layer 3) at 104kbps. 

5 

Whilst the above method has been described generally with reference to ad-hoc 
systems, it will be clear to the reader that it may apply equally to communications 
systems which utilise a managing infrastructure. It will be equally appreciated that 
apparatus able to carry out the above method is included within the scope of the 
10 invention. A description of such apparatus is as follows. 

~AnexaTfipleofa wireless "coTtimunicatiowsystem — 
of the present invention is shown in Figure 10. Mobile stations 12, 14 and 16 of 
Figure 10 can communicate with a base station 18. Mobile stations 12, 14 and 16 
15 could be mobile telephones with video facility, v|deo cameras or the like. 

Each of the mobile stations shown in Figure 10 can communicate through base 
station 18 with one or more other mobile stations. If mobile stations 12, 14 and 16 
are capable of direct mode operation, then they may communicate directly with 
20 one another or with other mobile stations, without the communication link passing 
through base station 18. 

Figure 1 1 illustrates a mobile station (MS) in accordance with the present 
invention. The mobile station (MS) of Figure 11 is a radio communication device, 
25 and may be either a portable- or a mobile radio, or a mobile telephone, with video 
facility, or a video camera with communications facility. 

The mobile station 12 of Figure 1 1 can transmit sound and/or video signals from a 
user of the mobile station. The mobile station comprises a microphone 34, which 
30 provides a sound signal, and a video camera 35, which provides a video signal, for 
transmission by the mobile station. The signal from the microphone is transmitted 
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by transmission circuit 22. Transmission circuit 22 transmits via switch 24 and 
antenna 26. 

In contrast, the video signal from camera 35 is first processed using a method 
5 according to the present invention by controller 20, which may be a 

microprocessor, possibly in combination with a read only memory (ROM) 32, 
before passing to the transmission circuit 22 for onward transmission via switch 24 
and antenna 26. 

10 ROM 32 is a permanent memory, and may be a non-volatile Electrically Erasable 
Programmable Read Only Memory (EEPROM). ROM 32 is connected to 
controller^© viaime 30r 

The mobile station 12 of Figure 1 1 also comprises a display 42 and keypad 44, 
15 which serve as part of the user interface circuitry of the mobile station. At least the 
keypad 44 portion of the user interface circuitry is activatable by the user. Voice 
activation of the mobile station may also be employed. Similarly, other means of 
interaction with a user may be used, such as for example a touch sensitive screen. 

20 Signals received by the mobile station are routed by the switch to receiving 

circuitry 28. From there, the received signals are routed to controller 20 and audio 
processing circuitry 38. A loudspeaker 40 is connected to audio circuit 38. 
Loudspeaker 40 forms a further part of the user interface. 

25 A data terminal 36 may be provided. Terminal 36 would provide a signal 

comprising data for transmission by transmitter circuit 22, switch 24 and antenna 
26. Data received by receiving circuitry 28 may also be provided to terminal 36. 
The connection to enable this has been omitted from Figure 1 1 for clarity of 
illustration. 

30 

The present invention has been described above purely byway of example, and 
modifications of detail may be undertaken by those skilled in the relevant art. 
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Claims 

1 . A method of optimising the quality of a picture produced by temporal scalability 
for an enhancement layer of a video bit stream, characterised in that the picture 

5 (B0.5) is predicted based on a picture (El 0 ) appearing in the highest enhancement 
layer of the bit stream. 

2. A method of optimising the quality of a picture produced by temporal scalability 
for an enhancement layer of a video bit stream, characterised in that a picture 

10 (B0.5) is predicted based on a single picture (El 0 ) already appearing in the same 
enhancement layer of the bit stream. 

3. A method as claimed in claim 1 or claim 2, wherein 

the picture (B0.5) is predicted using a forward prediction method. 

15 

4. A method as claimed in any preceding claim, wherein 

the picture used for the prediction is an enhanced picture (El 0 ) over the 
corresponding picture (El 0 ) appearing in the layer below. 

20 5. A method as claimed in any preceding claim, wherein 

if an appropriate picture (El 0 ) is not available to enable the prediction to occur, the 
predicted picture (B1.5) is bi-directionally predicted, based on previous and 
subsequent pictures (Eli, El 2 ) in the layer below. 

25 6. A method as claimed in any preceding claim, wherein 

the method is used in a three or more layer system, and the picture (B0.5) 
produced by temporal scalability appears in the highest layer. 

7. A method according to any preceding claim, wherein 
30 the method is used in a multi-layer hybrid SNR/ temporal scalability method for 
improving a video bit stream. 
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8. A method according to any preceding claim, wherein 

the method is used in a multi-layer hybrid spatial/temporal scalability method for 
improving a video bit stream. 

5 9. A method according to any preceding claim, wherein 

the method is used in a multi-layer hybrid object based/temporal scalability method 
for improving a video bit stream. 

10. A system (10) or apparatus (12) for implementing a method according to any 
10 preceding claim, wherein 

the system or apparatus includes processor means (20) for optimising the quality 
- - of a picture produced by temporal scalability for an enhancement layer of "a video 
bit stream prior to transmission. 

15 1 1 . A system or apparatus according to claim 10, 

the system (10) or apparatus (12) forming a part of a wireless or mobile 
communications system. 

12. An apparatus according to claim 10 or claim 1 1, wherein 
20 the apparatus (12) is a mobile station which incorporates a video camera (35). 



WO 02/05563 



PCT/EP01/07885 



118 




WO 02/05563 



PCT/EP01/07885 



2/8 



ENHANCEMENT 
LAYER 



EI 



EP 



< 



BASE 
LAYER 



EP 



t t t 



ILLUSTRATION OF SNR SCALABILITY 



FIG. 3 



ENHANCEMENT 
LAYER 



EI 



EP 



BASE 
LAYER 



EP 



t t t 



ILLUSTRATION OF SPATIAL SCALABILITY 



FIG. 4 



WO 02/05563 



PCTYEP01/07885 



3/8 



UJ 



to 

« 

c5T 



c£ h- <c 

UJ LU 

>-za: 
f=5 {— 
i co 

UJ h— p- 

o_o fc- 

LiJHD 

a: q_ o 



V / / / ^ 




ro uj 

UJ 
UJO 

>- z 



CNJ UJ 

Q£ UJ 
UJ o 

£1 



UJ 



UJ 



UJ 

CO 



^3 



o 

CD 



— ^ 

UJ ^ < 

^ UJ z 
^ " 

^- < -J 

^ ■< 

ra* ^ 

|±J S s 

7 Q_ 



CO 



S 0= 

Q UJ LU 

UJ >- >- 

£ 3 3 

□ □E2 



UJ, 

ro 



UJ 



3 S 



0 o 

Q_ Q_ 
I— 

UJ UJ 

1 i i 1 1 i 
CJ> o 

«< -< 

J"-*L > i i 1 1 1 
uj 

< co cn 

U. CM CM 



* £ S 

uj z 



E3SE2 



WO 02/05563 



PCT/EP01/07885 



418 



UJ 



ad <c 

UJ LU 

>-zP 

<Ch-l 
_l CO 
UJ , 

h- oc: tr 

UJ H- P- 

cuo tr 

LUH2 

0:0.0 






o 



lu >- 
or <c 



CM 

<c < 

S g 



in 
oo" 



o o z 

c a 3 

LU 

H- H- « 

u u < 
2 2 n= 

LU LU Z 
O O LU 

z z 
< «C _! 

as 
o 



K5 LU 

cc: LU 

UJ O 

£1 



CM LU 

Q£ UJ 
LU O 
>- ^ 



UJ Q£ 



a/ LU 

g]</> 



CO 



tu CM M ro 

Q£ Q£ 

p> LU UJ UJ 

1= >- >r 

z -«c -< <c 



□ □□□ 



to 

CD 



O 



2S 

o 



fO 

Or: 
LU 
>- 



WO 02/05563 



PCT/EP01/07885 



5/8 




IS THE 
"PREVIOUS PICTURE^ 
A PICTURE IN THE SAME 
ENHANCEMENT LAYER 
AS THE REQUIRED, 
PICTURE?. 



I YES 



FORWARD PREDICT THE B PICTURE 
BASED ON THE PREVIOUS PICTURE 



PRODUCE THE B PICTURE BASED ON 

A BI-DIRECTIONAL PREDICTION 
USING PREVIOUS AND SUBSEQUENT 
LOWER LAYER PICTURES 



OUTPUT 



OUTPUT 



FIG. 7 



WO 02/05563 



PCT/EP01/07885 




WO 02/05563 



PCT/EP01/07885 




WO 02/05563 



PCT/EP01/07885 



8/8 




FIG.IO 



34 



35 



VIDEO 



DATA 
TERMINAL 



T 

36 



DISPLAY 



42 



□ □□ 

□ □□ 

□ □□ 

□ □□ 
KEYPAD 



44 



26 



SWITCH 



20 



CONTROLLER 







~ 




Tx 


^■22 


24 


Rx 



28 



38 
I 



AUDIO 



30 



4()3\ 



ROM 



■32 



12 



FIG. 11 



Ii >nal Application No 



rv,i/£P 01/07885 



A. CLASSIFICATION OF SUBJECT MATTER , 

IPC 7 H04N7/30 H04N7/32 






According to International Patent Classification (IPC) or to both national classification and IPC 




B. FIELDS SEARCHED 


Minimum documentation searched (classification system followed by classification symbols) 

IPC 7 H04N 


Documentation searched olher than minimum documentation to the extent thai such documents are Included in the fields searched 


Electronic data base consulted during the international search (name of data base and, where practical, search terms used 


) i 


EPO-Internal 






C. DOCUMENTS CONSIDERED TO BE RELEVANT 


Category • 


Citation of document, with indication, where appropriate, of the relevant passages 


Relevant to claim No. 


_A . ... _ 


_ W0 _99 .33274 A (ROSE KENNETH) 
1 July 1999 (1999-07-01) 
cited in the application 
the whole document 




...1-12 


A 


EP 0 739 140 A (SUN MICROSYSTEMS INC) 
23 October 1996 (1996-10-23) 
cited in the application 
the whole document 


1-12 


A 


EP 0 644 695 A (BELL COMMUNICATIONS RES 

;AT & T CORP (US)) 

22 March 1995 (1995-03-22) 

the whole document 


1-12 


A 


EP 0 634 871 A (AT & T CORP) 
18 January 1995 (1995-01-18) 
the whole document 


/__ 


1-12 


| X | Further documents are listed In the continuation of box C. 


[)( j Patent family members are listed In annex. 


0 Special categories of cited documents : 

'A* document defining the general state of the art which is not 
considered to be of particular relevance 

'E" earlier document but published on or after the international 
filing date 

"L" document which may throw doubts on priority clalm(s) or 
which is cited to establish the publication date of another 
citation or other special reason (as specified) 

•0' document referring to an oral disclosure, use, exhibition or 
other means 

•P* document published prior to the international filing date but 
later than the priority date claimed 


V later document published after the international filing date 
or priority date and not in conflict with the application but 
cited to understand the principle or theory underlying the 
Invention 

X" document of particular relevance; the claimed Invention 
cannot be considered novel or cannot be considered to 
involve an inventive step when the document is taken atone 

*Y' document of particular relevance; the claimed invention 
cannot be considered to involve an Inventive step when the 
document is combined with one or more other such docu- 
ments, such combination being obvious to a person skilled 
in the art. 

*&' document member of the same patent family 


Date of the actual completion of the international search 


Date of mailing of the international searcn report 


5 


November 2001 


13/11/2001 




Name and mailing address of the ISA 

European Patent Office, P.B. 5818 Palentlaan 2 
NL-2280HVRljswijk 
1 Tel. (+31-70) 340-2040, Tx. 31 651 epo nl, 
Fax: (+31-70) 340-3016 


Authorized officer 

Gries, T 



ii >nai Mppiicaiion »g 

rui/dP 01/07885 



C(Continuatlon) DOCUMENTS CONSIDERED TO BE RELEVANT 



Category 0 Citation of document, with indication .where appropriate, of the relevant passages 



Relevant to claim No. 



WILDON D ET AL: "ON THE CODING OF THE 
ENHANCEMENT LAYER OF AN MPEG-2 SNR 
SCALABLE CODER" 

JOURNAL OF ELECTRONIC IMAGING, SPIE + 
IS&T, US, 

vol. 7, no. 3, 1 July 1998 (1998-07-01), 
pages 516-526, XP000771759 
ISSN: 1017-9909 
the whole document 

INTERNATIONAL TELECOMMUNICATION UNION ITU 
/ SULLIVAN, G . : "DRAFT TEXT OF 
RECOMMENDATION H.263 VERSION 2(H.263+) FOR 
DECISION" 

ITU-T DRAFT RECOMMENDATION H.263 VERSION 
2, 

27 January 1998 (1998-01-27), pages 
1-153, XP000982914 
cited in the application 
page 2 

page 111 -page 121 

(Annex 0:Temporal, SNR, and Spatial 

Scalability mode) 



1-12 



1-12 



Information on patent family members 



li mat Application No 

rui/cP 01/07885 



Patent document 




Publication 




Patent family 


Publication 


cited in search report 




date 




member(s) 


date 


WO 9933274 


A 


01-07-1999 


AU 


1928999 A 


12-07-1999 








WO 


9933274 Al 


01-0/-1999 


EP 0/39140 


A 


23-10-1996 


us 


5621660 A 


15-04-1997 








EP 


moftl A f\ AO 

0739140 A2 


Z3-1U-1990 








JP 


9163362 A 


20-06-1997 








US 


5768535 A 


16-06-1998 



EP 0644695 


A 


22-03- 


■1995 


CA 
EP 
JP 


2127151 Al 
0644695 A2 
7162870 A 


22-03-1995 

22- 03-1995 

23- 06-1995 


EP 0634871 


A 


18-01- 


-1995 


CA 


2126467 Al 


14^01-1995 










EP 


0634871 A2 


18-01-1995 










JP 


7170514 A 


04-07-1995 










US 


5742343 A 


21-04-1998 



